fix(process): give backend workers a parent-death safety net#10639
Merged
Conversation
…fully Symptom: a backend model-worker subprocess (the per-model gRPC server LocalAI spawns) can be orphaned and linger — holding VRAM and its listen port — if the LocalAI process is killed non-gracefully (e.g. a supervisor's graceful-shutdown grace period elapses and LocalAI is SIGKILLed) before its own teardown runs. Root cause: LocalAI's graceful teardown (pkg/signals/handler.go installs the SIGINT/SIGTERM handler; core/cli/run.go registers app.Shutdown -> ModelLoader.StopAllGRPC -> process.Stop in pkg/model/process.go) only runs when LocalAI receives a catchable signal and survives long enough to run its handlers. Backends are spawned via github.com/mudler/go-processmanager v0.1.1, whose getSysProcAttr() sets Setpgid:true (own process group, so the group can be signalled) but never PR_SET_PDEATHSIG/Pdeathsig, and exposes no Config field or option for a caller to inject/extend SysProcAttr. LocalAI fully delegates spawning to that library (it never builds the exec.Cmd itself), so it cannot set a kernel parent-death signal at the spawn site. If LocalAI is SIGKILLed, nothing tells the backend to exit and it is reparented to init. Fix: add a best-effort, backend-side safety net at the one shared choke point every out-of-process Go backend routes through — grpc.StartServer / RunServer in pkg/grpc. On startup it captures getppid() and polls; when the process is reparented (getppid changes / becomes 1 — the standard POSIX signal the original parent died) it logs and self-terminates. getppid() reparent detection is portable (Linux + macOS), unlike Linux-only PR_SET_PDEATHSIG. Toggle via LOCALAI_BACKEND_PARENT_WATCH (default on; off on Windows) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL. This is strictly a backstop alongside the existing graceful SIGTERM->grace->SIGKILL teardown, which is unchanged. Scope/limitations: covers Go-based backends (everything using pkg/grpc). The C++ backends (e.g. llama-cpp) and Python backends do not route through pkg/grpc and are not covered by this mechanism — they would each need an equivalent parent-death check (follow-up). The fully general fix is for go-processmanager to expose SysProcAttr injection so LocalAI can set Pdeathsig at spawn for every backend regardless of language (suggested upstream follow-up; out of scope for this LocalAI-only PR). Test: pkg/grpc/parentwatch_test.go builds a real test -> middle -> grandchild process tree, lets the middle process exit to orphan the grandchild running the real watchParentDeath, and asserts it detects the reparent and self-terminates. Unix-only (build-tagged), runs in CI (Linux). Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The Go parent-death watcher (pkg/grpc/parentwatch.go, commit 772b435) only protects backends that route through pkg/grpc. C++ and Python backends don't, so the originally-reported case — the llama.cpp gRPC worker surviving a non-graceful LocalAI death — was still uncovered. Extend the same best-effort backstop to both languages, reusing the exact mechanism and semantics: - capture getppid() at startup, skip if already orphaned (<=1) - a background thread polls getppid() and self-exits on reparenting (getppid() != orig || == 1), portable across Linux/macOS, no-op on Windows - same env vars: LOCALAI_BACKEND_PARENT_WATCH (default on; falsy false/0/no/off disable) and LOCALAI_BACKEND_PARENT_WATCH_INTERVAL (default 2s; accepts Go-style durations like 500ms/2s/1m) C++: implemented in backend/cpp/llama-cpp (the reported, most-used C++ backend) as a dependency-free header parent_watch.h, wired into grpc-server.cpp's main() and copied at build time via prepare.sh. C++ backends have no shared server scaffolding, so other C++ backends (ds4, ik-llama-cpp, privacy-filter, ...) are not yet covered and would each need the same one-line include+call as follow-ups. Python: implemented once in the shared common/parent_watch.py and armed from common/grpc_auth.py's get_auth_interceptors() — the single helper every one of the 35 Python backends invokes while building its gRPC server — so all Python backends (and future ones) are covered with no per-backend edits and no duplicated implementation. Tests (real process-tree reparent detection, mirroring the Go test): - backend/cpp/llama-cpp/parent_watch_test.cpp (via run-unit-tests.sh) - backend/python/common/parent_watch_test.py (python -m unittest) Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
04b474b to
94e3e06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
A backend model-worker subprocess — the per-model gRPC server LocalAI spawns (e.g. a llama.cpp/whisper/etc. worker) — can be orphaned and linger, holding VRAM and its listen port, if the LocalAI process itself is killed non-gracefully (for example, a supervising process's graceful-shutdown grace period elapses and LocalAI is
SIGKILLed) before LocalAI's own teardown runs. This was hit by a downstream project that supervises LocalAI as a child process.Root cause
LocalAI does have a working-by-design graceful teardown:
pkg/signals/handler.goinstallssignal.Notify(c, SIGINT, SIGTERM), runs registered handlers, then exits.app.Shutdown()(core/cli/run.go), which callsModelLoader.StopAllGRPC()→process.Stop()(pkg/model/process.go).That teardown only runs if LocalAI receives a catchable signal and survives long enough to run its handlers. If LocalAI is
SIGKILLed, none of it runs.Backends are spawned via
github.com/mudler/go-processmanagerv0.1.1. ItsgetSysProcAttr()(in the library'sprocess_unix.go) setsSetpgid: true— intentional, so the graceful path can signal the backend's whole process group — but it never setsPR_SET_PDEATHSIG/Pdeathsig, and the library exposes noConfigfield or functional option to inject/extendSysProcAttr. LocalAI fully delegates spawning to that library (pkg/model/process.gocallsprocess.New(...).Run(); it never builds theexec.Cmditself), so LocalAI cannot set a kernel parent-death signal at the spawn site. When LocalAI dies without cleaning up, the backend is reparented to init and keeps running. There is no fallback that makes an orphaned backend self-terminate.Fix
Add a best-effort, backend-side safety net that detects reparenting: on startup each backend captures
getppid()and polls it; when the process is reparented (getppid changes / becomes1— the standard POSIX signal that the original parent has died) it logs and self-terminates.getppid()detection is portable across Linux + macOS, unlike Linux-onlyPR_SET_PDEATHSIG(which also has a false-positive with a Go parent: the signal fires when the spawning thread exits, which the Go runtime may retire while the process lives).The same mechanism, env vars and semantics are now applied across all three backend languages LocalAI ships:
pkg/grpc/parentwatch.go, armed in the sharedgrpc.StartServer/grpc.RunServerchoke point that every out-of-process Go backend routes through.backend/cpp/llama-cpp/parent_watch.h, a dependency-free header wired intogrpc-server.cpp'smain()(and copied at build time viaprepare.sh).backend/python/common/parent_watch.py, armed fromcommon/grpc_auth.py'sget_auth_interceptors()— the single shared helper every Python backend invokes while building its gRPC server.Shared configuration (identical across all three):
LOCALAI_BACKEND_PARENT_WATCH— default on; falsy valuesfalse/0/no/off(case-insensitive) disable it; automatically off on Windows (different reparenting semantics).LOCALAI_BACKEND_PARENT_WATCH_INTERVAL— poll interval, default2s; accepts Go-style durations (500ms,2s,1m) in every language for parity.getppid() <= 1).This is strictly a backstop alongside the existing graceful
SIGTERM → grace → SIGKILLteardown, which is unchanged in all three languages. No shutdown timing,GracefulTimeout, orIsBusy()polling was touched.Test coverage
Each language has a real process-tree reparent test (
test → middle → grandchild): themiddleprocess exits to orphan thegrandchild(running the real watcher), and the test asserts the watcher detects the reparent and self-terminates.Go —
pkg/grpc/parentwatch_test.go:C++ —
backend/cpp/llama-cpp/parent_watch_test.cpp(usesfork(2); standard library only, so it runs via the existing standalonebackend/cpp/run-unit-tests.shrunner — no CUDA/gRPC build needed; also buildable under ctest with-DLLAMA_GRPC_BUILD_TESTS=ON):(The full backend build needs the llama.cpp + gRPC toolchain, so the watcher is verified by compiling and running its own translation unit standalone — the header is intentionally dependency-free precisely so this is possible.)
Python —
backend/python/common/parent_watch_test.py(usesos.fork; standard library only):Known limitations / follow-ups (not overclaiming)
llama-cppbackend only. C++ backends have no shared server scaffolding (eachbackend/cpp/*/grpc-server.cpphas its ownmain/RunServer), so the watcher was added to the originally-reported, most-used backend (llama.cpp). The other C++ backends —ds4,ik-llama-cpp,privacy-filter— are not yet covered; each would need the same one-line#include "parent_watch.h"+start_parent_death_watcher()as a follow-up (the header is reusable as-is).common/choke point, with no per-backend edits.go-processmanagerto exposeSysProcAttrinjection so LocalAI can setPdeathsigat spawn for every backend regardless of language. That is a change to a separate repo and is intentionally out of scope for this LocalAI-only PR — suggested as an upstream follow-up.🤖 Generated with Claude Code